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ABSTRACT 

We present One Hand Clapping (OHC), a method for 
the detection of condition-specific interactions 
between transcription factors (TFs) from 
genome-wide gene activity measurements. OHC is 
based on a mapping between transcription factors 
and their target genes. Given a single case-control 
experiment, it uses a linear regression model to 
assess whether the common targets of two arbitrary 
TFs behave differently than expected from the 
genes targeted by only one of the TFs. When 
applied to osmotic stress data in S. cerevisiae, 
OHC produces consistent results across three 
types of expression measurements: gene expres- 
sion microarray data, RNA Polymerase II ChlP-chip 
binding data and messenger RNA synthesis rates. 
Among the eight novel, condition-specific TF pairs, 
we validate the interaction between Gcn4p and 
Arrlp experimentally. We apply OHC to a large 
gene activity dataset in S. cerevisiae and provide a 
compendium of condition-specific TF interactions. 

INTRODUCTION 

Homeostasis, the ability to respond to a plethora of en- 
vironmental challenges, is vital to the cell. This adaptation 
is achieved by an orchestrated regulation of gene expres- 
sion. It was discovered that some transcription factors 
(TFs) act as master regulators in many different condi- 
tions, and that the specificity of the regulatory response 
is obtained through dispatching the signal from the master 
regulators to downstream TFs (1). It is quite clear that 
direct TF interactions (TFIs), both physical and genetic, 



are the prevalent mechanisms of this dispatching (2—4). A 
method for the detection of functionally relevant, 
condition-specific TFIs would therefore greatly contribute 
to our understanding of gene regulation. 

A necessary first step toward the detection of TFIs is the 
quantification of individual TF activity. It is difficult to 
deduce the activity of a TF by its expression alone [only a 
small fraction of TFs show expression levels that correlate 
with those of their target genes (5)], as there are many 
alternative mechanisms to activate TFs. A complementary 
approach is the quantification of TF-DNA binding with 
chromatin immunoprecipitation (ChIP) assays (6). Com- 
putational approaches rely on a known TF-target inter- 
action graph (6,7). A linear model that describes gene 
expression as the product of a position-specific activity 
matrix derived from binding data, and the unknown TF 
activities are presented in (8). The experimental detection 
of TFIs is based on techniques such as co-immuno- 
precipitation and protein binding arrays (6,9), which are 
costly and time-consuming. A statistical framework to 
deduce TF cooperativity from overrepresentation of 
common TF motifs at the promoter region of target 
genes is presented in (10,11). However, these approaches 
do not make direct use of gene expression profiles, nor are 
their predictions condition-specific. The most promising 
approaches integrate multiple sources of information, 
e.g. expression data with binding sites from ChIP. The 
idea is that if two TFs act cooperatively then there 
should exist a sufficiently large target gene set to which 
both TFs bind, and the expression profiles of these target 
genes should be similar across a series of experiments (12). 
This concept is used to rigorously assess cooperativity 
among TFs in the yeast cell cycle (13). Bar- Joseph et al. 
(14) construct regulatory gene modules by requiring 
co-regulation and the co-occurrence of binding sites for 
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a pair of interacting TFs. Beer et al. (15) cluster gene ex- 
pression profiles in a preliminary step and apply a 
Bayesian classifier to predict TF modules, i.e. groups of 
TFs that act together in regulating a set of targets. 
Advanced statistical models for the integration of 
binding data and expression data are used in (16). Single 
TFs and TF sets are modeled as hidden variables in a 
sparse regression model. In this way, the authors can 
assign a significance value for the combinatorial activity 
of each TF set. Wang et al. (17) view the problem of TFI 
identification as a learning task and use Bayesian 
networks for the integration of multiple sources of 
evidence to predict cooperatively binding TFs. 

Although there are only few studies that focus on TFIs, 
genetic interactions in general have been investigated ex- 
tensively. Classically, the biological concept of genetic 
interaction (e.g. epistasis) between two components relies 
on the simultaneous perturbation of two components that 
yields an effect which is different from what one would 
expect from the perturbation of the individual compo- 
nents. This was applied at large scale in synthetic lethal- 
ity/growth defect screens like (18-20), to name a few of 
them. Typically, as many genes as possible are screened 
for interaction in an automated way by measuring the 
fitness of single and double gene deletions. Both fitness 
measures (growth and lethality) are one dimensional. It 
is still under debate how the deviation of the double 
deletion fitness from the fitness of the single deletions 
can be appropriately measured and tested in a rigid math- 
ematical framework (21,22). While this direct interaction 
measure proved to be rather fragile, the comparison of 
interaction profiles (the vector of all interaction scores of 
one gene with all others) yielded surprisingly robust and 
good results (22). Furthermore, it became evident that the 
experimental effort can be reduced considerably if not all 
pairwise combinations of the genes of interest [~5.4 
million combinations tested in (18)] are screened, and 
that even more information can be gained from measure- 
ments under different conditions. This insight is reflected 
in the work of Bandyopadhyay et al. (23) which identified 
genes interacting with DNA damage-specific partners, 
screening a comparably low number of 80000 double 
mutants. 

In the present work, we extend the concept of gen- 
etic interaction to high-dimensional phenotypes (e.g. 
genome-wide messenger RNA (mRNA) measurements, 
RNA-seq) as these become increasingly available. We for- 
mulate a mathematical concept of TFI which relies on the 
assumption that the common targets of interacting TFs 
should behave significantly different than the genes 
targeted by only one TF alone. So far, each pairwise 
genetic interaction had to be tested in an individual ex- 
periment, requiring a huge number of combinatorial per- 
turbations. Our method instead needs only one global 
intervention to the system [the fact that led to the name 
One Hand Clapping (OHC)] in the form of an environ- 
mental stimulus, and a high-dimensional gene activity 
readout in order to score all pairwise TFIs. As in the 
case of synthetic genetic arrays, we compare the 
obtained interaction profiles between TFs to obtain 
reliable and stable predictions. A first proof of concept 



of this method was given in (24), where we applied OHC 
to transcriptional activity data obtained under osmotic 
stress. Here, we establish a solid methodological basis 
and provide a proof of its universal applicability. After 
benchmarking the performance of OHC, we construct a 
compendium of high confidence, condition-specific TFIs 
based on a large gene expression screen (25). Finally, we 
validate two of the novel TFI predictions under osmotic 
stress, one of them in silico, the other one in vivo. OHC is 
available as an open source, user-friendly R package (see 
Supplementary data). The current best practice in the 
study of gene regulation, consisting of quantification of 
differential expression and gene set enrichment analysis, 
can now be extended by the screening for combinatorial 
TF activity. 

MATERIALS AND METHODS 

TFI model 

Let there be gene activity measurements e g for all genes 
ge G. G is the set of all genes of the organism. In our case, 
the values e g will be the log folds of the activity in a per- 
turbation experiment versus a wild-type control. Suppose, 
we knew all TF-target relations (for a discussion how to 
obtain such a TF-target annotation see the next subsec- 
tion). For each TF T, we then had a binary indicator 
function I(g e T) taking on value 1 if gene g belongs to 
the target set of T and 0 otherwise. The main idea of 
our method is to divide the set of all genes into 
four subsets (Figure 1): those genes that are targeted by 
none of the two TFs those that are targeted by only one of 
the TFs and those that are targeted by both TFs. Apart 
from a possible baseline shift p 0 in gene activity, TF 7} 
alone is assumed to have an effect (5, on its targets 
(j = 1, 2). Disregarding the baseline shift, the common 
targets of T\ and T 2 are expected to show a change in 
activity that amounts to Pi + p 2 , if the two TFs do not 
interact. The deviation from this expectation is quantified 
by the interaction term (3 12 , which presents the most 
interest. Formally, this can be cast as a second-order 
linear regression of e ? versus the covariates I(ge T x ) and 
I(geT 2 ), 

e g ~p 0 + frUg eT l ) + p 2 I(g e T 2 ) + p l2 I(g e T { )I(g e T 2 ), 

with geG. The regression is performed for each TF pair 
separately, since including more TFs and their interaction 
terms would lead to overfitting. This cannot be alleviated 
by using regularization methods such as ridge regression 
or lasso regression (data not shown). Running the regres- 
sion in an all-against-all fashion for a set of TFs T results 
in a symmetric [7] x \ T\ interaction matrix M containing 
all interaction terms Pi 2 (see Supplementary Figures 
S3-S6). We noticed that the interaction terms alone are 
not strong predictors of interaction (data not shown). The 
possible explanations for this are 3-fold: the definition of 
the target sets Ti and T 2 is imperfect, the expression meas- 
urements are prone to unsystematic variation or the model 
of TF activity might be too simplistic in some cases. 
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Figure 1. Schematic description of the linear regression model: for two 
TFs T[ and T 2 . expression of all genes that are targets of Tj is 
described by coefficient P] (cyan), expression of genes that are targets 
of T 2 is described by coefficient p 2 (red) and expression of genes that 
are targets of both TFs is described by coefficient p 12 (green). p 0 (white) 
is the coefficient for the baseline activity. It is connected to all genes 
including those that are targets of neither Jinor T 2 (white circles). 
Remaining connections are symbolized by dotted lines. Circles at the 
bottom symbolize genes and are colored according to the TF that 
targets them. The whole formula of the logistic linear regression is 
shown at the top, with the relevant parts highlighted at the bottom. 
For a detailed description of the model, refer to 'Materials and 
Methods' section. 



TF annotation 

One cornerstone for finding TFIs by looking at commonly 
regulated target genes is the availability of a sufficiently 
accurate TF-target gene mapping. Such a mapping is 
rarely available, especially for different growth conditions. 
This is a limitation of the method that will hopefully be 
alleviated with the advent of ChlP-seq data of TFs in 
many organisms, as they are being generated by the 
ENCODE and modENCODE consortia (26-28). 

For Saccharomyces cerevisiae, there are fortunately 
several several high-quality TF-target mappings available. 
TF-target relations mined from a manually curated litera- 
ture repository can be found in the YEASTRACT 
database (29) which is used in this work. We filter this 
annotation removing TFs with <10 annotated target 
genes. This leaves 165 TFs with a median of 167 annotated 
genes per TF. The size distribution of the annotated gene 
groups is shown as Supplementary Figure SI. 

Supplementary Figure S2A shows a box plot of expres- 
sion folds (total fraction) of the TFs from YEASTRACT. 
Prominent differentially expressed TFs are explicitly 
shown (XBP1, MAGI, SIP4, CIN5, NRG1, CUP2, 
TEC1, ASH1 and BAS1). Most of these outliers are not 
directly involved in the salt stress or general stress 
response pathways, confirming that TF activity is not 
regulated at the transcriptional level. 

When looking at the coefficients p ls (3 2 and (3 12 from the 
regression model of all TF pairs in the YEASTRACT 
database, there is no apparent structure (Supplementary 
Figure S2B). Closer investigation reveals extreme values 
that are due to pairwise interactions between a small set of 
four TFs (Hotlp, Spsl8p, Gislp and Gat4p, see 
Supplementary Figure S2C). Indeed these TFs have 
target genes that are strongly differentially expressed, 



thus giving rise to a high Pi 2 coefficient to every TF 
having a considerable overlap with one of these four 
TFs. The mean expression of all target genes is 
above that of all other TFs (Supplementary Figure 
S2C). A Gene Ontology analysis revealed that they 
are stress responder genes involved in response to 
various stimuli and to heat shock (Supplementary 
Table SI). We removed these four outlier TFs from the 
TF-target graph, leaving us with a final annotation con- 
taining 161 TFs. 

TFI prediction 

To arrive at robust TFI predictions, we use a 
'guilt-by-association' principle that has been commonly 
applied in genetic interaction screens (18). Instead of 
comparing single interaction values, we compare the inter- 
action profiles of each TF (the rows of the interaction 
matrix M) by means of their correlation. More specific- 
ally, we use 1 —Pearson correlation as a distance measure. 
We apply hierarchical average linkage clustering to the 
rows of M using this distance measure. The resulting clus- 
tering dendrogram is shown in Supplementary Figures 
S7-S10. The two descendants of the terminal branches 
of this dendrogram define our TFI predictions 
(Supplementary Algorithm 1). The reasoning behind this 
is that we expect many TFs to have at least one interaction 
partner in a given condition, and the most likely partner is 
the one with the most similar interaction profile. 
Alternatively, we tried to predict TFIs based on P-values 
derived from a null distribution of the correlation dis- 
tances. Such null distributions can be either derived 
from Pearson's Product Moment Coefficient (30) or, 
more conservatively, from resampling procedures 
(shuffling target genes). Still our simple clustering proced- 
ure works best in terms of area under the curve (AUC) 
(data not shown; for a definition of AUC see section 
'Results'). 

Gene activity data 

In this article, we use several datasets as input to our OHC 
method. First, we use mRNA expression data from a time 
course experiment exposing a wild-type yeast strain to 
osmotic stress by adding 0.8 M NaCl (see (24) for more 
details). The article provides standard total mRNA 
expression data after 36 min of osmotic stress (dataset 
Dl), as well as the corresponding measurements of 
'newly synthesized' mRNA (dataset D2), which are 
roughly proportional to the mRNA synthesis rates at 
the time of measurement. Throughout this article, we 
always mean log expression folds (log quotient of expres- 
sion under the experimental condition against expression 
in the control experiment) when referring to expression 
data. To test the reliability of our method, we included 
an unrelated gene expression dataset generated by 
Mitchell et al. (31) obtained from 5". cerevisiae under 
osmotic stress. The total mRNA expression level (corres- 
ponding to the total fraction of Miller et al.) 30 min after 
addition of 0.8 M NaCl was measured (dataset D3). The 
gene expression datasets from Miller et al. and Mitchell 
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et al. should be highly comparable, since the same yeast 
strain, the same microarray platform and a similar 
protocol were used. Microarray data were downloaded 
as raw files from GEO (32) (accession number: 
GSE 15936) for Mitchell et al. data and from 
ArrayExpress (accession number: E-MTAB-439) for 
Miller et al. Normalization was performed using gcrma 
(33) (as implemented in R/Bioconductor (34)) without 
quantile normalization, since we expect global effects of 
the perturbation on mRNA expression. As a completely 
different way of assessing gene activity, Miller et al. (24) 
also provide RNA polymerase II (Pol II) occupancies 
from ChlP-chip experiments 24 min after addition of 
salt. We use their Pol II mean occupancy on each gene 
(between transcription start site and polyadenylation site) 
as another proxy for gene activity (dataset D4). 

Yeast strains and growth assays 

The S. cerevisiae deletion strains hoglA, arrlA and 
gcn4A, as well as the wild-type strain BY4741 were 
obtained from Open Biosystems (Huntsville, USA). The 
double deletion strain arrl A/gcn4A was generated by 
integrating a ClonNat cassette in the ARR1 locus of the 
gcn4A strain. Correct gene disruptions were verified by 
polymerase chain reaction (PCR). Spot dilutions were 
done to assess fitness and growth under osmotic stress. 
Equal amounts of freshly grown yeast cells in YPD were 
re-suspended in water, 10-fold dilutions were spotted on 
YPD plates and YPD plates with 1.2 M NaCl. Plates were 
incubated for 4 days at 30°C. Results are found in 
Supplementary Figure SI 1 . 

Gene expression microarrays 

Overnight cultures were diluted in fresh synthetic complete 
medium with 2% glucose to OD 60 onm = 0.1 (120 ml 
cultures, 160 rpm shaking incubator, 30°C). In the early 
log phase (OD 600nm = 0.8), 20 ml of the culture was har- 
vested by centrifugation (no salt stress sample). 
Afterwards, NaCl was added to the remaining culture to 
a final concentration of 0.8 M, 30 min after addition, 
20 ml of culture was harvested (salt stress sample). 
Total RNA was prepared after cell lysis using a 
FastPrep-24 instrument (Millipore) and subsequent 
purification using the RiboPure- Yeast Kit (Ambion) 
following the manufacturer's instructions. All following 
steps were conducted according to the Affymetrix 
GeneChip 3TVT Express Kit protocol. Briefly, one-cycle 
complementary DNA (cDNA) synthesis was performed 
with 300 ng of total RNA. In vitro reverse transcription 
labeling was performed for 16 h. The fragmented sam- 
ples were hybridized for 16 h on 'Yeast Genome 2.0' 
expression arrays (Affymetrix), washed and stained using 
a Fluidics 450 station and scanned on an Affymetrix 
GeneArray scanner 3000 7G. Micorarray data have been 
deposited to the ArrayExpress database (http://www 
.ebi.ac.uk/microarray) under accession number 
E-MEXP-3566. 



RESULTS 

OHC accurately predicts pairwise TFIs 

We first applied OHC to mRNA expression data from the 
total mRNA fraction of Miller et al. (dataset Dl) using 
the filtered YEASTRACT database as TF-target annota- 
tion (see 'Materials and Methods' section). The resulting 
interaction matrix is shown as a heatmap (Supplementary 
Figure S3). The rows of the matrix were clustered and TFI 
predictions were made as described in 'Materials and 
Methods' section. We predict 59 mutually disjoint TFI 
pairs, while for 43 single TFs no interaction partners 
were predicted. Validation of the predictions was done 
through the BioGRID database [(35), version 3.1.71]. It 
contains physical and genetic interactions for many yeast 
proteins that were derived from high-and low-throughput 
experiments in the literature. The subgraph of BioGRID 
corresponding to interactions between TFs, as well as their 
degree distribution, is shown in Supplementary Figure 
S12. From the 59 predicted TFIs, we validate 13 of them 
as listed in BioGRID (a positive predictive value of 22%). 
A complete list of predicted and validated pairs can be 
obtained with the Supplementary Code. Validated TFI 
predictions had a significantly lower correlation distance 
than unvalidated TFIs (Wilcoxon's test, P-value 0.004). 
This shows that interacting TF pairs are more closely 
related (considering our interaction measure and 
distance function) than unvalidated predictions. This is 
further investigated through an ROC plot (Figure 2A). 
The AUC (76%) shows a strong deviation from random 
predictions (diagonal) and shows that the profile correl- 
ation measure can serve as a proxy for predicting inter- 
actions. To better assess the performance of the clustering 
and prediction algorithm, we verified the overrepre- 
sentation of validated prediction using Fisher's test 
(P-value <10 -5 , odds ratio 5.291, with a 95% confidence 
interval [2.67;10]). When testing only genetic or physical 
interactions from BioGRID (P-values <10" 5 and 0.003, 
respectively), we find a bias toward prediction of genetic 
interactions, as defined by BioGRID. 

We tested the consistency of predictions on incomplete 
TF-target annotations by removing an increasing 
precentage of TFs from the annotation. We measured 
the agreement of predictions on the smaller TF annotation 
with predictions made on the original annotation 
Supplementary Figure SI 6). In addition, we measured 
the performance as the number of validated pairs accord- 
ing to BioGRID. Expectedly the drop in agreement is 
stronger than the drop in performance, because 
removing one TF from a pair will regroup the remaining 
TF with another with high probability, thus changing the 
predictions. Simultaneously performance decreases more 
slowly, showing that the regrouping of TFs leads to new 
validated pairs. After removal of 20% of TFs, perform- 
ance merely drops from 22% to 18%. 

OHC is stable on a wide range of gene activity data 

To test the stability of our method we applied it to the 
mRNA expression data of the labeled fraction from the 
same osmotic stress experiment used previously (termed 
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Figure 2. Validation of OHC predictions (A) ROC curve using TFIs in BioGRID as benchmark, colored area and horizontal lines are confidence 
intervals of sensitivity and specificity, respectively. (B) Overlap between predicted and validated pairs between all datasets. Dl: total mRNA fraction 
D2: labeled mRNA fraction, D3: mRNA data from Mitchell et al. and D4: Pol II ChlP-chip occupancy measurements; predictions across datasets 
agree well, dataset D4 having the most distinct predictions. Each intersect is shown as a circle, with radius proportional to the intersect size. The 
black box shows the intersect used as novel predictions. The numbers in parentheses indicate the subset of interactions that are validated by 
BioGRID. (C) Pairwise comparison of expression or occupancy values for all genes. Numbers in lower part indicate Spearman's correlation 
between datasets. Dl and D3 have the highest correlation as they are both total mRNA expression measurements. D2 has a very good but 
lower correlation with Dl and D3. This is due to subtle differences when measuring newly synthesized mRNA. D4 has a very weak positive 
correlation as Pol2 occupancy as an indicator of gene activity is very different from mRNA expression. 



dataset D2, see 'Materials and Methods' section). Both 
datasets are similar (Spearman's p = 0.85, Figure 2C) 
and we expect similar results. On this dataset, we predict 
60 pairwise interactions, 11 validated by the BioGRID 
database (18% prediction accuracy; predicted pairs: 
Nrglp-Nrg2p, Fhllp-Ifhlp, Stplp-Stp2p, Msn2p-Msn4p, 
Mbplp-Swi4p, Ecm22p-Upc2p, Cbflp-Met28p 
Ndt80p-Sumlp, Arg80p-Arg81p, Hap3p-Hap5p and 
Mga2p-Spt23p). The validated interactions highly agree 
between both datasets, eight pairs being validated by 
both runs (Figure 2B). The interactions Ace2p-Swi5p, 
Ecm22p-Mot3p, Pdrlp-Pdr3p, Mbplp-Skn7p and 
Flo8p-Phdlp found in the first dataset are lost in the 
second, the interactions Mbplp-Swi4p, Ecm22p-Upc2p 
and Cbflp-Met28p in the second are not present in the 
first dataset. Comparison of all predicted interactions 
(Figure 2B) features an overlap of 23 pairwise interactions 
(38%). 

Reproducibility was tested by running the method on 
another osmotic stress dataset from (31) (mRNA expres- 
sion measurement 30 min after addition of NaCl) termed 
D3 (Spearman's p = 0.88; Figure 2C). The method 
predicts 60 pairwise interactions and 14 validated inter- 
actions (23%). The overlap with the previous two 
datasets is 26 and 23 pairs for datasets Dl and D2, re- 
spectively. Validated interactions agree strongly; they 
overlap at 12 and 8 validated interactions for Dl and 
D2, respectively (Figure 2B). It is interesting to notice 
that the datasets D3/D1 agree more closely than D3/D2 
and D1/D2. This might be due to the fact that Dl and D3 
measure the total mRNA at the extraction timepoint and 
thus include mRNAs transcribed before the onset of stress 
and not yet degraded, contrary to D2 which corresponds 
to the labeled mRNA fraction and thus contains only 
mRNAs transcribed after the onset of stress. Indeed, 



Dl/D3have a higher correlation than D1/D2 and D3/ 
D2 (Figure 2C). 

To show that the method also works on proxies of gene 
activity other than mRNA expression measurements, we 
used the Pol II ChlP-chip data from (24) (termed D4). On 
this dataset, the method predicts 57 interactions, 12 of 
which can be validated (21% accuracy). Its performance 
is thus comparable to the performance on mRNA expres- 
sion data. The predictions vary strongly as there are only 
12, 10 and 12 predicted interactions shared with the 
datasets Dl, D2 and D3, respectively (Figure 2B). This 
is due to a low correlation between the datasets Dl and 
D3 varying between 0.16 and 0.3 (Spearman's rank cor- 
relation, see Figure 2C). Despite the low correlation, a 
core of eight interactions is shared between all datasets 
(including four novel predictions) and shows that the 
method is robust enough to adapt to various measures 
of gene activity. 

The method can also readily be applied to data from 
other organisms. Supplementary Figure SI 5 and accom- 
panying text describe the application to human pancreatic 
cancer data. Using a TF annotation containing 153 TFs, 
OHC predicts 57 putative interactions of which 7 can be 
validated (12% positive predictive value). The difference 
in performance compared with yeast data might be 
attributed to an incomplete annotation of human TFIs 
in BioGRID. 

OHC finds cis and trans TFIs 

We distinguish between two main types of combinatorial 
TFIs: cz'.v-regulatory interactions and ?ra«.v-regulatory 
interactions (36). Cis interactions are mediated by a 
specific TF binding site configuration at the cw-regulatory 
region of a gene, possibly resulting in cooperative or com- 
petitive binding of TFs. Competitive binding occurs when 
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two TFs share a common or overlapping binding motif. 
Cooperative binding of TFs occurs if two TFs are required 
to bind simultaneously to be functional, or if the binding 
of the second TF is enhanced by the binding of the first 
TF, which is the case, e.g. for nucleosome-mediated 
cooperativity (37). 

Trans interactions are defined as direct protein-protein 
interactions of both TFs prior to DNA binding, either by 
forming a protein complex or by complex formation with 
other co-factors involved in Pol II recruitment and tran- 
scription initiation. 

TF pairs predicted by our method on dataset Dl and 
validated by BioGRID include the following types of 
interaction: Ace2p-Swi5p (38) and Sumlp-Ndt80p (39) 
undergo competitive m-regulatory interactions, the 
former having identical binding sites, the latter having 
overlapping binding sites. Mot3p-Ecm22p (40), 
Mbplp-Skn7p (41), Arg80p-Arg81p (42), Hap3p-Hap5p 
(43) and Pdrlp-Pdr3p (44) are all examples of trans-regu- 
latory protein interactions forming prior to DNA binding. 
The pair Ifhlp-Fhllp represents a special type of trans 
interaction. Fhllp is by default bound to the promoter 
of ribosomal protein genes without influencing transcrip- 
tion. The phosphorylation of Iflilp enables the binding 
and activation of Fhllp (45). 

Three interactions [Msn2p-Msn4p (46), Mga2p-Spt23p 
(47) and Stpl-Stp2p (48)] could not be categorized unam- 
biguously. They consist of homologous or functionally 
redundant proteins, implying that both cis and trans inter- 
actions could serve as regulatory mechanism. We call 
these interactions 'ambiguous'. 

OHC provides a compendium of condition-specific TFIs 

Absolutely no changes to the model are required when 
applying the method to large datasets containing 
gene activity measurements under diverse conditions. 
Consequently, we ran the method on mRNA expression 
data from 173 experiments [data compiled by Gash et al. 
(25)] which is grouped into 16 conditions with at least five 
experiments. Clustering the experiments according to the 
correlation of the expression profiles across conditions 



recovers the grouping into 16 conditions defined above. 
Similarly, clustering the predictions made by OHC on 
each experiment according to the number of common 
TFIs between experiments recovers the condition classes 
as well (Supplementary Figure SI 3). This demonstrates 
that predictions by OHC are truly condition specific and 
reproducible. 

We compiled a compendium of confident condition- 
specific TFIs. For each condition, we selected the OHC 
interactions that are found in more than half of the ex- 
periments for that condition. This compendium is 
provided as Supplementary Data. The graph representa- 
tion of this compendium (Figure 3) is sparsely connected 
with many isolated pairs. The number of conditions for a 
pair of TF is encoded by edge width, indicating the speci- 
ficity of the interaction. Due to false negatives and the 
limited variety of environmental conditions in (25), our 
network is far from being complete, and too sparse to 
be conclusive about its topological properties, such as 
edge degree distribution and connectivity. Yet, it high- 
lights an important organizational feature of signaling 
pathways, namely a functional hierarchy, where informa- 
tion is flowing from general to specific regulators: some 
TF pairs interact in more than one condition. Most of 
them are either protein complexes (e.g. Hap2p-Hap3p- 
Hap5p), form heterodimer (e.g. Arg80-Arg81p and 
Ino2p-Ino4p) or are highly similar or homolog TF (e.g. 
Nrglp-Nrg2p, Msn2p-Msn4p, Mga2-Spt23p and 
Upc2p-Ecm22p). This is the reason why the aforemen- 
tioned interactions are detected in multiple conditions; 
simply because the activation of one interaction partner 
leads to complex formation. The other TFs that interact 
with different partners, need both to be active under the 
same condition for an interaction to be predicted. This is 
the case for the interaction between Skn7p and Stb5p 
which is exclusively predicted by OHC under diamide 
treatment, which seems plausible as both have a role in 
the oxidative stress response (49,50). Skn7p is the more 
general TF while Stb5 is diamide specific. Indeed, STB5 
null mutants have a decreased resistance to diamide (51). 
Another interesting finding is the TF Tyelp which, as our 
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Figure 4. (A) Motifs of CIN5 and YAP6 from YeTFaSCo database. Both motifs are very similar, which is confirmed by motif search, as a test for 
co-occurrence of both motifs is significant (see text). (B) Growth assay on YPD plates and YPD plates containing 1.2 M NaCI after incubating for 
4 days at 30°C. Only cell growth at a dilution of 1:100 is shown and the relevant parts have been extracted from the image. The full image of the 
experiment including replicates and negative controls can be found as Supplementary Figure SI 1. Growth phenotype is not affected in YPD medium. 
Wild-type cells have decreased phenotype under osmotic stress and cirri A mutants show strong decrease in growth phenotype, while gcn4A mutants 
and double mutants do not. This shows the synthetic rescue of the effect of the knockout of ARR1 in the double mutant. (C) Hypothetical model 
explaining the observations from the growth assay experiments (B). This model is focused on the genes that respond positively to osmotic stress 
(symbolized by the line with arrowhead). A double inhibition chain of Arrlp H Gcn4p leads to the observed phenotypes under osmotic stress: in 
wild-type cells, the inhibitory effect of Gcn4p is prevented by Arrlp and the cells grow normally. The same observation is made when knocking out 
GCN4 as the logic of regulation does not change. The knockout of ARR1 relieves the inhibition on Gcn4p, which in turn downregulates the target 
genes. We speculated that this is causing problems with osmo-adaptation, leading to a reduced cell growth. The double mutant rescues that 
phenotype as the genes are only driven by the osmotic stress (as in wild type). (D) Log-expression values of candidate genes responsible for synthetic 
rescue across all arrays. All candidates are affected by the knockout of ARR1. Four candidate genes are uncharacterized ORFs, two are proteins of 
unknown function and the rest has a variety of different roles in different pathways. The genes have not yet been linked to osmotic stress. 



model postulates, regulates glycolysis together with Gcrlp 
under H 2 0 2 exposure and together with Rgtlp when cells 
are exposed to dithiothreitol. Condition-specific transcrip- 
tional control is achieved by activating Tyelp under 
several oxidative stress inducing agents and specifically 
pairing it with TFs only active under one such agent. 
OHC helps in discovering this type of combinatorial 
gene regulation. 

Novel predictions of TFIs can be validated experimentally 

Novel predictions are defined as consensus predictions 
between datasets Dl, D2 and D3 (indicated by a black 
box in Figure 2B). We left dataset D4 out because of the 
low correlation with the rest of the data. This gives us 
eight novel predictions namely the pairs: Cin5p-Yap6p, 
Gcn4p-Arrlp, Zaplp-Spt2p, Skolp-Sok2p, Hsflp-Aftlp, 
Sip4p-Cdcl4p, Cup2p-Yrrlp and RimlOlp-Otulp. 

Cin5p and Yap6p bind competitively 

We realized both Cin5p and Yap6p have very similar 
binding motifs (Figure 4A) according to the YeTFaSCo 
database (52), choosing the motifs with high expert confi- 
dence. They are derived from ChlP-chip data by Harbison 
et al. (6) and Maclsaac et al. (7) for CIN5 and YAP6, 
respectively. 

We searched for both motifs using these position- 
specific weight matrices (PWMs) and the MEME suite 
(53) (F1MO version 4.7.0 using default parameters for 



P-value and (/-value thresholds) on intergenic regions 
defined by (6). Testing for co-occurence of both motifs 
on all intergenic regions is highly significant 
(P- value <10~ 5 ). We found 135 intergenic regions where 
both motifs have one or several matches. In this set, we 
find 149 competitive matches, where the distance between 
both motif occurrences is 0 and 36 cases having five or 
more nucleotides between motif occurrences. Motif search 
also shows that the TFs can bind alone as for some 
intergenic regions only a match for a single TF falls 
below the P-value threshold. As there are protein-binding 
microarray [PBM (9)] derived motifs for each TF, we 
deduce that both proteins can bind DNA on their own. 
The motif similarity from ChlP-chip data is thus not due 
to a protein complex between Cin5p and Yap6p and we 
conclude that both TFs bind competitively to the 
promoter of their common target genes. 

The other novel predictions do not show such a clear 
evidence for an interaction based on their motifs, so we 
decided to perform experimental validation for one add- 
itional pair. We chose the pair Gcn4p-Arrlp as both inter- 
action partners have the largest target sets among all 
predicted pairs (as defined by YEASTRACT, 1260 and 
743 target genes for Gcn4p and Arrlp, respectively). 

GCN4/ARR1 show a synthetic rescue phenotype 
To validate the interaction between GCN4 and ARR1, we 
performed a classical genetic interaction screen (Figure 4B 
and Supplementary Figure Sll). We assayed the growth 
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of a wild-type strain as well as single and double deletion 
strains in rich medium (YPD) and under osmotic stress 
(YPD + 1.2 M NaCl). The single deletions had no effect 
in rich media due to the condition specificity of the 
prediction. While wild type and gcn4A grew normal 
under osmotic stress, arrlA showed a strong decrease 
in cell growth. The growth defect is rescued in the 
double deletion strain gcn4A/arrlA. This indicates an 
interaction between both proteins, though the experimen- 
tal design cannot distinguish between a cis or trans 
interaction. 

Our current working hypothesis on the mechanism 
of the interaction is shown in Figure 4C. We expect 
most genes commonly regulated by both TFs to be salt 
stress responders (because of the condition specificity of 
OHC). We know from previous experiments (24) that 
Gcn4p acts as a repressor under osmotic stress. By pos- 
itioning Arrlp upstream of and inhibiting Gcn4p, this 
model explains our observations from the growth assay 
experiments. The removal of Arrlp from the system 
probably leads to genes important for osmo-adaptation 
to be repressed by Gcn4p, reducing cell growth rate. The 
removal of Gcn4p has no noticable effect in this model. 
The double knockout reestablishes conditions close to 
wild type, where genes are only regulated by the osmotic 
stress response. 

We performed mutant cycle analysis [see (54)] to eluci- 
date the mechanism of this interaction. Briefly, transcrip- 
tional profiling was done for single and double deletion 
strains, before and after exposure to osmotic stress condi- 
tions (0.8 M NaCl; see Supplementary Figure S14 for a 
comparison of all profiles). For each gene, its expression 
under osmotic stress was explained by a linear model ac- 
counting for an effect of the GCN4 deletion, an effect of 
the ARR1 deletion and their interaction effect. We selected 
the genes whose interaction effect was positive and larger 
than log 2 1.5 (45 genes). Then, we filtered this group for 
genes showing a decrease in expression in the arrlA arrays 
and an expression similar to wild type in the double 
mutant (leaving 37 genes). The genes should be salt 
stress responders and thus should show a 2-fold increase 
of their wild-type expression under osmotic stress relative 
to wild-type expression in synthetic complete medium. 
This criterion reduced the candidate set to nine genes 
(Figure 4D). The filtering criteria were chosen in accord- 
ance to the expected model (Figure 4C). When shuffling 
the arrays and applying the same criteria we find at most 
two genes, showing that the result is not random. Four 
of the nine candidate genes are uncharacterized ORFs 
(YDR366C, YJL107C, YMR034C and YGR066C), 
Bop2p and Spg4p are proteins of unknown function. 
The other candidates are involved in a variety of biolo- 
gical processes such as heme degradation, pheromone- 
induced signaling, survival at high temperature or as a 
membrane protein [Saccharomyces Genome Database 
(SGD) (55)]. This suggests a novel function of these 
genes as a part of the osmotic stress response pathway, 
albeit their roles are unclear and a Blastn/Blastp 
homology search did not help reveal their function. 



DISCUSSION 

OHC has been established as a method to predict 
condition-specific TFIs; its implementation, which does 
not require any parameter adjustments by the user, is 
provided as a software package for R. It takes advantage 
of the increasingly reliable and comprehensive resources 
on gene-specific transcriptional regulators. OHC is 
data-inexpensive; 'two' genome-wide gene activity meas- 
urements (under normal and stress conditions) are already 
sufficient. With this sparse input, we derive a robust inter- 
action measure that is stable on many different types of 
gene activity data. Despite its modest sensitivity, its pre- 
dictions are relevant due to their high specificity. 

Applied on osmotic stress data and TF-target relations 
from YEASTRACT, OHC predicts 59 interactions. 23 of 
the interactions can be validated by BioGRID (22%). 
Although gene activity data are available for many differ- 
ent conditions in all organisms, it may be difficult to find a 
mapping of TFs to a set of target genes suitable for OHC 
in other organisms. For the yeast S. cerevisiae, there are 
fortunately several options available, the most important 
being the YEASTRACT database (29) and the dataset 
provided by Maclsaac et al. (7). When we run the 
method using the latter, we predict 38 interactions, only 
6 of which can be validated by BioGRID (16% prediction 
accuracy). While the annotation from Maclsaac et al., 
based on ChlP-chip data, is of high quality, it does 
not suit our purpose, as it contains assignments made 
under standard experimental conditions. YEASTRACT 
contains many TF-target gene assignments under different 
stress conditions and knockout strains. 

It is important to note that the predictions made by 
OHC are entirely different from predictions based on 
target genes set alone. Indeed, a straightforward Fisher 
test for target gene overlap does not find the same TFIs 
as OHC (data not shown). In particular, the method can 
and does predict interactions between TFs that have no 
overlap in target genes and thus no interaction score. This 
is possible because we predict interactions based on profile 
similarity which takes into account the interaction scores 
with all other TFs. We found three TFIs without target 
gene overlap: Kar4p-Stblp, Rdslp-YJL206C and 
Cbflp-Mig2p. 

In silico validation of the method is based on all inter- 
actions between TFs submitted to BioGRID. As this re- 
pository is not exhaustive, the performance measurements 
in this article represent conservative estimates. Moreover, 
entries in BioGRID are biased toward interactions present 
under normal growth conditions and frequently studied 
stress conditions, as these account for the large part of 
the studies that contributed to BioGRID. We selected 
one novel candidate pair (Gcn4p-Arrlp) for in vivo valid- 
ation by growth assays under osmotic stress. The growth 
defect in the arrlA strain showed a synthetic rescue 
phenotype in the arrl A/gcn4A double deletion strain. 
Subsequent gene expression analysis revealed nine candi- 
date genes potentially involved in the synthetic rescue, not 
previously connected to osmotic stress. 

Application to a large dataset comprising 16 conditions 
showed that different pairs are detected in each condition. 
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We compiled a compendium of confident condition- 
specific interactions, where each pair has to be predicted 
in at least half of the experiments for each condition (sta- 
bility). This provides a resource for studying functionally 
relevant condition-specific TFIs. Since different inter- 
actions are predicted in different conditions, we confirm 
that TF combinatorics drive adaptation to environmental 
challenges. 

This method can be extended in several ways: first, the 
linear model from which the interaction score is derived 
can be replaced by a more elaborate physical model of TF 
activation, as has been attempted by (56,57). Currently 
these models fall short of describing TF competition ad- 
equately (58,59). Nonetheless, we speculate that inclusion 
of chromatin structure, in particular nucleosome position- 
ing, in the interaction score will improve the method. 
Second, OHC can be generalized to other organisms, as 
reliable TF-target annotations will become available. 
Finally, the screening principle introduced here lends 
itself to generalization: the only property of TFs that 
enters the model is that each TF splits the genes into 
two disjoint sets (targets versus non-targets), i.e. each 
TF defines a binary property on the set of genes. It is 
therefore straightforward to perform a condition-specific 
interaction screen on any collection of binary properties, 
such as pathway membership [e.g. KEGG (60)] or func- 
tional annotation [e.g. GO (61)]. 
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